pkgs <- c("sf", "gstat", "mapview", "nngeo", "rnaturalearth", "dplyr",
"nomisr", "osmdata", "tidyr", "texreg", "downlit", "xml2")
lapply(pkgs, require, character.only = TRUE)2 Data Manipulation & Visualization
Required packages
For mapping
pkgs <- c("tmap", "tmaptools", "viridisLite",
"ggplot2", "ggthemes", "rmapshaper")
lapply(pkgs, require, character.only = TRUE)Session info
sessionInfo()R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.utf8
[2] LC_CTYPE=English_United Kingdom.utf8
[3] LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.utf8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] rmapshaper_0.5.0 ggthemes_4.2.4 ggplot2_3.4.2
[4] viridisLite_0.4.2 tmaptools_3.1-1 tmap_3.3-3
[7] xml2_1.3.4 texreg_1.38.6 tidyr_1.3.0
[10] osmdata_0.2.3 nomisr_0.4.7 dplyr_1.1.2
[13] rnaturalearth_0.3.3 nngeo_0.4.7 mapview_2.11.0
[16] gstat_2.1-1 sf_1.0-13
loaded via a namespace (and not attached):
[1] gtable_0.3.3 xfun_0.39 raster_3.6-20
[4] htmlwidgets_1.6.2 lattice_0.21-8 vctrs_0.6.3
[7] tools_4.3.1 crosstalk_1.2.0 generics_0.1.3
[10] curl_5.0.1 parallel_4.3.1 stats4_4.3.1
[13] tibble_3.2.1 proxy_0.4-27 spacetime_1.3-0
[16] fansi_1.0.4 xts_0.13.1 pkgconfig_2.0.3
[19] KernSmooth_2.23-21 satellite_1.0.4 data.table_1.14.8
[22] RColorBrewer_1.1-3 leaflet_2.1.2 webshot_0.5.4
[25] lifecycle_1.0.3 stringr_1.5.0 compiler_4.3.1
[28] FNN_1.1.3.2 rsdmx_0.6-2 munsell_0.5.0
[31] terra_1.7-29 leafsync_0.1.0 codetools_0.2-19
[34] snakecase_0.11.0 stars_0.6-1 htmltools_0.5.5
[37] class_7.3-22 pillar_1.9.0 classInt_0.4-9
[40] lwgeom_0.2-13 abind_1.4-5 tidyselect_1.2.0
[43] digest_0.6.31 stringi_1.7.12 purrr_1.0.1
[46] fastmap_1.1.1 grid_4.3.1 colorspace_2.1-0
[49] cli_3.6.1 magrittr_2.0.3 base64enc_0.1-3
[52] dichromat_2.0-0.1 XML_3.99-0.14 utf8_1.2.3
[55] leafem_0.2.0 e1071_1.7-13 withr_2.5.0
[58] scales_1.2.1 sp_1.6-1 rmarkdown_2.22
[61] httr_1.4.6 zoo_1.8-12 png_0.1-8
[64] evaluate_0.21 knitr_1.43 V8_4.3.0
[67] rlang_1.1.1 Rcpp_1.0.10 glue_1.6.2
[70] DBI_1.1.3 jsonlite_1.8.5 R6_2.5.1
[73] plyr_1.8.8 intervals_0.15.3 units_0.8-2
Reload data from pervious session
load("_data/msoa_spatial.RData")
load("_data/ulez_spatial.RData")
load("_data/pollution_spatial.RData")
load("_data/pubs_spatial.RData")2.1 Manipulation and linkage
Having data with geo-spatial information allows to perform a variety of methods to manipulate and link different data sources. Commonly used methods include 1) subsetting, 2) point-in-polygon operations, 3) distance measures, 4) intersections or buffer methods.
The online Vignettes of the sf package provide a comprehensive overview of the multiple ways of spatial manipulations.
Check if data is on common projection
st_crs(msoa.spdf) == st_crs(pol.spdf)[1] FALSE
st_crs(msoa.spdf) == st_crs(pubs.spdf)[1] FALSE
st_crs(msoa.spdf) == st_crs(ulez.spdf)[1] FALSE
# MSOA in different crs --> transform
pol.spdf <- st_transform(pol.spdf, crs = st_crs(msoa.spdf))
pubs.spdf <- st_transform(pubs.spdf, crs = st_crs(msoa.spdf))
ulez.spdf <- st_transform(ulez.spdf, crs = st_crs(msoa.spdf))
# Check if all geometries are valid, and make valid if needed
msoa.spdf <- st_make_valid(msoa.spdf)2.1.1 Subsetting
We can subset spatial data in a similar way as we subset conventional data.frames or matrices. For instance, below we simply reduce the pollution grid across the UK to observations in London only.
# Subset to pollution estimates in London
pol_sub.spdf <- pol.spdf[msoa.spdf, ] # or:
pol_sub.spdf <- st_filter(pol.spdf, msoa.spdf)
mapview(pol_sub.spdf)Or we can reverse the above and exclude all intersecting units by specifying st_disjoint as alternative spatial operation using the op = option (note the empty space for column selection). st_filter() with the .predicate option does the same job. See the sf Vignette for more operations.
# Subset pubs to pubs not in the ulez area
sub2.spdf <- pubs.spdf[ulez.spdf, , op = st_disjoint] # or:
sub2.spdf <- st_filter(pubs.spdf, ulez.spdf, .predicate = st_disjoint)
mapview(sub2.spdf)# We can easily create indicators of whether an MSOA is within ulez or not
msoa.spdf$ulez <- 0
within <- msoa.spdf[ulez.spdf,]
msoa.spdf$ulez[which(msoa.spdf$MSOA11CD %in% within$MSOA11CD)] <- 1
table(msoa.spdf$ulez)
0 1
954 29
2.1.2 Point in polygon
We are interested in the number of pubs in each MSOA. So, we count the number of points in each polygon.
# Assign MSOA to each point
pubs_msoa.join <- st_join(pubs.spdf, msoa.spdf, join = st_within)
# Count N by MSOA code (drop geometry to speed up)
pubs_msoa.join <- dplyr::count(st_drop_geometry(pubs_msoa.join),
MSOA11CD = pubs_msoa.join$MSOA11CD,
name = "pubs_count")
sum(pubs_msoa.join$pubs_count)[1] 1601
# Merge and replace NAs with zero (no matches, no pubs)
msoa.spdf <- merge(msoa.spdf, pubs_msoa.join,
by = "MSOA11CD", all.x = TRUE)
msoa.spdf$pubs_count[is.na(msoa.spdf$pubs_count)] <- 02.1.3 Distance measures
We might be interested in the distance to the nearest pub. Here, we use the package nngeo to find k nearest neighbours with the respective distance.
# Use geometric centroid of each MSOA
cent.sp <- st_centroid(msoa.spdf[, "MSOA11CD"])Warning: st_centroid assumes attributes are constant over
geometries
# Get K nearest neighbour with distance
knb.dist <- st_nn(cent.sp, pubs.spdf,
k = 1, returnDist = TRUE,
progress = FALSE)projected points
msoa.spdf$dist_pubs <- unlist(knb.dist$dist)
summary(msoa.spdf$dist_pubs) Min. 1st Qu. Median Mean 3rd Qu. Max.
9.079 305.149 565.018 701.961 948.047 3735.478
2.1.4 Intersections + Buffers
We might also be interested in the average tree cover density within 1 km radius around each MSOA centroid. Therefore, we first create a buffer with st_buffer() around each midpoint and subsequently use st_intersetion() to calculate the overlap.
# Create buffer (1km radius)
cent.buf <- st_buffer(cent.sp, dist = 1000)
mapview(cent.buf)# Add area of each buffer (in this constant)
cent.buf$area <- as.numeric(st_area(cent.buf))
# Calculate intersection of pollution grid and buffer
int.df <- st_intersection(cent.buf, pol.spdf)Warning: attribute variables are assumed to be spatially constant
throughout all geometries
int.df$int_area <- as.numeric(st_area(int.df)) # area of intersection
# Area of intersection as share of buffer
int.df$area_per <- int.df$int_area / int.df$area
# Aggregate as weighted mean
int.df <- st_drop_geometry(int.df)
int.df$no2_weighted <- int.df$no22011 * int.df$area_per
int.df <- aggregate(list(no2 = int.df[, "no2_weighted"]),
by = list(MSOA11CD = int.df$MSOA11CD),
sum)
# Merge back to spatial data.frame
msoa.spdf <- merge(msoa.spdf, int.df, by = "MSOA11CD", all.x = TRUE)
mapview(msoa.spdf[, "no2"])Note: for buffer related methods, it often makes sense to use population weighted centroids instead of geographic centroids (see here for MSOA population weighted centroids). However, often this information is not available.
2.1.5 Air pollution and ethnic minorities
With a few lines of code, we have compiled an original dataset containing demographic information, air pollution, and some infrastructural information.
Let’s see what we can do with it.
# Define ethnic group shares
msoa.spdf$per_mixed <- msoa.spdf$KS201EW_200 / msoa.spdf$KS201EW0001 * 100
msoa.spdf$per_asian <- msoa.spdf$KS201EW_300 / msoa.spdf$KS201EW0001 * 100
msoa.spdf$per_black <- msoa.spdf$KS201EW_400 / msoa.spdf$KS201EW0001 * 100
msoa.spdf$per_other <- msoa.spdf$KS201EW_500 / msoa.spdf$KS201EW0001 * 100
# Define tenure
msoa.spdf$per_owner <- msoa.spdf$KS402EW_100 / msoa.spdf$KS402EW0001 * 100
msoa.spdf$per_social <- msoa.spdf$KS402EW_200 / msoa.spdf$KS402EW0001 * 100
# Non British passport
msoa.spdf$per_nonUK <- (msoa.spdf$KS205EW0001 - msoa.spdf$KS205EW0003)/ msoa.spdf$KS205EW0001 * 100
msoa.spdf$per_nonEU <- (msoa.spdf$KS205EW0001 - msoa.spdf$KS205EW0003 -
msoa.spdf$KS205EW0004 - msoa.spdf$KS205EW0005 -
msoa.spdf$KS205EW0006)/ msoa.spdf$KS205EW0001 * 100
msoa.spdf$per_nonUK_EU <- (msoa.spdf$KS205EW0005 + msoa.spdf$KS205EW0006)/ msoa.spdf$KS205EW0001 * 100
# Run regression
mod1.lm <- lm(no2 ~ per_mixed + per_asian + per_black + per_other +
per_owner + per_social + pubs_count + POPDEN + ulez,
data = msoa.spdf)
# summary
screenreg(list(mod1.lm), digits = 3)
========================
Model 1
------------------------
(Intercept) 36.899 ***
(1.311)
per_mixed -0.078
(0.099)
per_asian 0.017 *
(0.007)
per_black -0.085 ***
(0.016)
per_other 0.469 ***
(0.047)
per_owner -0.205 ***
(0.013)
per_social -0.056 ***
(0.013)
pubs_count 0.220 ***
(0.040)
POPDEN 0.036 ***
(0.003)
ulez 9.358 ***
(0.678)
------------------------
R^2 0.773
Adj. R^2 0.771
Num. obs. 983
========================
*** p < 0.001; ** p < 0.01; * p < 0.05
For some examples later, we also add data on house prices. We use the median house prices in 2017 from the London Datastore.
# Download
hp.link <- "https://data.london.gov.uk/download/average-house-prices/bdf8eee7-41e1-4d24-90ce-93fe5cf040ae/land-registry-house-prices-MSOA.csv"
hp.df <- read.csv(hp.link)
hp.df <- hp.df[which(hp.df$Measure == "Median" &
grepl("2011", hp.df$Year)), ]
table(hp.df$Year)
Year ending Dec 2011 Year ending Jun 2011 Year ending Mar 2011
983 983 983
Year ending Sep 2011
983
# Aggregate across 2011 values
hp.df$med_house_price <- as.numeric(hp.df$Value)
hp.df <- aggregate(hp.df[, "med_house_price", drop = FALSE],
by = list(MSOA11CD = hp.df$Code),
FUN = function(x) mean(x, na.rm = TRUE))
# Merge spdf and housing prices
msoa.spdf <- merge(msoa.spdf, hp.df,
by = "MSOA11CD",
all.x = TRUE, all.y = FALSE)
hist(log(msoa.spdf$med_house_price))
2.1.6 Save spatial data
# Save
save(msoa.spdf, file = "_data/msoa2_spatial.RData")2.2 Visualization
A large advantage of spatial data is that different data sources can be connected and combined. Another nice advantage is: you can create very nice maps. And it’s quite easy to do! Stefan Jünger & Anne-Kathrin Stroppe provide more comprehensive materials on mapping in their GESIS workshop on geospatial techniques in R.
Many packages and functions can be used to plot maps of spatial data. For instance, ggplot as a function to plot spatial data using geom_sf(). I am personally a fan of tmap, which makes many steps easier (but sometimes is less flexible).
A great tool for choosing coulour is for instance Colorbrewer. viridisLite provides another great resource to chose colours.
2.2.1 Tmaps
For instance, lets plot the NO2 estimates using tmap + tm_fill() (there are lots of alternatives like tm_shape, tm_points(), tm_dots()).
# Define colours
cols <- viridis(n = 7, direction = 1, option = "C")
mp1 <- tm_shape(msoa.spdf) +
tm_fill(col = "no2",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols, # colours
alpha = 1, # transparency
title = "NO2",
legend.hist = FALSE # histogram next to map?
) +
tm_borders(col = "white", lwd = 0.5, alpha = 0.5)
mp1
Tmap allows to easily combine different objects by defining a new object via tm_shape().
# Define colours
cols <- viridis(n = 7, direction = 1, option = "C")
mp1 <- tm_shape(msoa.spdf) +
tm_fill(col = "no2",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols, # colours
alpha = 1, # transparency
title = "NO2",
legend.hist = FALSE # histogram next to map?
) +
tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_shape(ulez.spdf) +
tm_borders(col = "red", lwd = 1, alpha = 1)
mp1
And it is easy to change the layout.
# Define colours
cols <- viridis(n = 7, direction = 1, option = "C")
mp1 <- tm_shape(msoa.spdf) +
tm_fill(col = "no2",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols, # colours
alpha = 1, # transparency
title = expression('in'~mu*'g'/m^{3}),
legend.hist = FALSE # histogram next to map?
) +
tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_shape(ulez.spdf) +
tm_borders(col = "red", lwd = 1, alpha = 1) +
tm_layout(frame = FALSE,
legend.frame = TRUE, legend.bg.color = TRUE,
legend.position = c("right", "bottom"),
legend.outside = FALSE,
main.title = "NO2",
main.title.position = "center",
main.title.size = 1.6,
legend.title.size = 0.8,
legend.text.size = 0.8)
mp1
We can also add some map information from OSM. However, it’s sometimes a bit tricky with the projection. That’s why we switch into the OSM projection here. Note that this osm query is build on retiring packages.
# Save old projection
crs_orig <- st_crs(msoa.spdf)
# Change projection
ulez.spdf <- st_transform(ulez.spdf, 4326)
msoa.spdf <- st_transform(msoa.spdf, 4326)
# Get OSM data for background
osm_tmp <- read_osm(st_bbox(msoa.spdf), ext = 1,
type = "stamen-toner", zoom = NULL) Please note that rgdal will be retired during October 2023,
plan transition to sf/stars/terra functions using GDAL and PROJ
at your earliest convenience.
See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
rgdal: version: 1.6-7, (SVN revision 1203)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.6.2, released 2023/01/02
Path to GDAL shared files: C:/Users/qtnztru/AppData/Local/R/win-library/4.3/rgdal/gdal
GDAL does not use iconv for recoding strings.
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 9.2.0, March 1st, 2023, [PJ_VERSION: 920]
Path to PROJ shared files: C:/Users/qtnztru/AppData/Local/R/win-library/4.3/rgdal/proj
PROJ CDN enabled: FALSE
Linking to sp version:1.6-1
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
# Define colours
cols <- viridis(n = 7, direction = 1, option = "C")
mp1 <- tm_shape(osm_tmp) + tm_rgb() +
tm_shape(msoa.spdf) +
tm_fill(col = "no2",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols, # colours
alpha = 0.8, # transparency
title = expression('in'~mu*'g'/m^{3}),
legend.hist = FALSE # histogram next to map?
) +
#tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_shape(ulez.spdf) +
tm_borders(col = "red", lwd = 1, alpha = 1) +
tm_layout(frame = FALSE,
legend.frame = TRUE, legend.bg.color = TRUE,
legend.position = c("right", "bottom"),
legend.outside = FALSE,
main.title = "NO2",
main.title.position = "center",
main.title.size = 1.6,
legend.title.size = 0.8,
legend.text.size = 0.8)
mp1
Tmap also makes it easy to combine single maps
# Define colours
cols1 <- viridis(n = 7, direction = 1, option = "C")
# Define colours
cols2 <- viridis(n = 7, direction = 1, option = "D")
mp1 <- tm_shape(osm_tmp) + tm_rgb() +
tm_shape(msoa.spdf) +
tm_fill(col = "no2",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols1, # colours
alpha = 0.8, # transparency
title = expression('in'~mu*'g'/m^{3}),
legend.hist = FALSE # histogram next to map?
) +
#tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_shape(ulez.spdf) +
tm_borders(col = "red", lwd = 1, alpha = 1) +
tm_layout(frame = FALSE,
legend.frame = TRUE, legend.bg.color = TRUE,
legend.position = c("right", "bottom"),
legend.outside = FALSE,
main.title = "NO2",
main.title.position = "center",
main.title.size = 1.4,
legend.title.size = 0.8,
legend.text.size = 0.8)
mp2 <- tm_shape(osm_tmp) + tm_rgb() +
tm_shape(msoa.spdf) +
tm_fill(col = "per_black",
style = "fisher", # algorithm to def cut points
n = 7, # Number of requested cut points
palette = cols2, # colours
alpha = 0.8, # transparency
title = "% white",
legend.hist = FALSE # histogram next to map?
) +
#tm_borders(col = "white", lwd = 0.5, alpha = 0.5) +
tm_shape(ulez.spdf) +
tm_borders(col = "red", lwd = 1, alpha = 1) +
tm_layout(frame = FALSE,
legend.frame = TRUE, legend.bg.color = TRUE,
legend.position = c("right", "bottom"),
legend.outside = FALSE,
main.title = "Ethnic Black inhabitants",
main.title.position = "center",
main.title.size = 1.4,
legend.title.size = 0.8,
legend.text.size = 0.8)
tmap_arrange(mp1, mp2, ncol = 2, nrow = 1)
And you can easily export those to png or pdf
png(file = paste("London.png", sep = ""), width = 14, height = 7, units = "in",
res = 100, bg = "white")
par(mar=c(0,0,3,0))
par(mfrow=c(1,1),oma=c(0,0,0,0))
tmap_arrange(mp1, mp2, ncol = 2, nrow = 1)
dev.off()png
2
2.2.2 ggplot
gp <- ggplot(msoa.spdf)+
geom_sf(aes(fill = no2))+
scale_fill_viridis_c(option = "B")+
coord_sf(datum = NA)+
theme_map()+
theme(legend.position = c(.9, .6))
gp
# Get some larger scale boundaries
borough.spdf <- st_read(dsn = paste0("_data", "/statistical-gis-boundaries-london/ESRI"),
layer = "London_Borough_Excluding_MHW" # Note: no file ending
)Reading layer `London_Borough_Excluding_MHW' from data source
`C:\work\Lehre\Geodata_Spatial_Regression\_data\statistical-gis-boundaries-london\ESRI'
using driver `ESRI Shapefile'
Simple feature collection with 33 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 503568.2 ymin: 155850.8 xmax: 561957.5 ymax: 200933.9
Projected CRS: OSGB36 / British National Grid
# transform to only inner lines
borough_inner <- ms_innerlines(borough.spdf)
# Plot with inner lines
gp <- ggplot(msoa.spdf)+
geom_sf(aes(fill = no2), color = NA)+
scale_fill_viridis_c(option = "A")+
geom_sf(data = borough_inner, color = "gray92")+
geom_sf(data = ulez.spdf, color = "red", fill = NA)+
coord_sf(datum = NA)+
theme_map()+
labs(fill = "NO2")+
theme(legend.position = c(.9, .6))
gp
2.3 Exercise
tbd